1/15
Contents
Title and Copyright Information
Preface
Audience
Documentation Accessibility
Related Documents
Conventions
Backus-Naur Form Syntax
Part I Administration
1
Introducing Oracle Big Data Appliance
1.1
What Is Big Data?
1.1.1
High Variety
1.1.2
High Complexity
1.1.3
High Volume
1.1.4
High Velocity
1.2
The Oracle Big Data Solution
1.3
Software for Big Data
1.3.1
Software Component Overview
1.4
Acquiring Data for Analysis
1.4.1
Hadoop Distributed File System
1.4.2
Apache Hive
1.4.3
Oracle NoSQL Database
1.5
Organizing Big Data
1.5.1
MapReduce
1.5.2
Oracle Big Data SQL
1.5.3
Oracle Big Data Connectors
1.5.3.1
Oracle SQL Connector for Hadoop Distributed File System
1.5.3.2
Oracle Loader for Hadoop
1.5.3.3
Oracle Data Integrator Application Adapter for Hadoop
1.5.3.4
Oracle XQuery for Hadoop
1.5.3.5
Oracle R Advanced Analytics for Hadoop
1.5.4
Oracle R Support for Big Data
1.6
Analyzing and Visualizing Big Data
2
Administering Oracle Big Data Appliance
2.1
Monitoring Multiple Clusters Using Oracle Enterprise Manager
2.1.1
Using the Enterprise Manager Web Interface
2.1.2
Using the Enterprise Manager Command-Line Interface
2.2
Managing Operations Using Cloudera Manager
2.2.1
Monitoring the Status of Oracle Big Data Appliance
2.2.2
Performing Administrative Tasks
2.2.3
Managing CDH Services With Cloudera Manager
2.3
Using Hadoop Monitoring Utilities
2.3.1
Monitoring MapReduce Jobs
2.3.2
Monitoring the Health of HDFS
2.4
Using Cloudera Hue to Interact With Hadoop
2.5
About the Oracle Big Data Appliance Software
2.5.1
Software Components
2.5.2
Unconfigured Software
2.5.3
Allocating Resources Among Services
2.6
About the CDH Software Services
2.6.1
Where Do the Services Run on a Single-Rack CDH Cluster?
2.6.2
Where Do the Services Run on a Multirack CDH Cluster?
2.6.3
About MapReduce
2.6.4
Automatic Failover of the NameNode
2.6.5
Automatic Failover of the ResourceManager
2.6.6
Map and Reduce Resource Configuration
2.7
Effects of Hardware on Software Availability
2.7.1
Logical Disk Layout
2.7.2
Critical and Noncritical CDH Nodes
2.7.2.1
High Availability or Single Points of Failure?
2.7.2.2
Where Do the Critical Services Run?
2.7.3
First NameNode Node
2.7.4
Second NameNode Node
2.7.5
First ResourceManager Node
2.7.6
Second ResourceManager Node
2.7.7
Noncritical CDH Nodes
2.8
Managing a Hardware Failure
2.8.1
About Oracle NoSQL Database Clusters
2.8.2
Prerequisites for Managing a Failing Node
2.8.3
Managing a Failing CDH Critical Node
2.8.4
Managing a Failing Noncritical Node
2.9
Stopping and Starting Oracle Big Data Appliance
2.9.1
Prerequisites
2.9.2
Stopping Oracle Big Data Appliance
2.9.3
Starting Oracle Big Data Appliance
2.10
Managing Oracle Big Data SQL
2.10.1
Adding and Removing the Oracle Big Data SQL Service
2.10.2
Allocating Resources to Oracle Big Data SQL
2.11
Security on Oracle Big Data Appliance
2.11.1
About Predefined Users and Groups
2.11.2
About User Authentication
2.11.3
About Fine-Grained Authorization
2.11.4
About On-Disk Encryption
2.11.5
Port Numbers Used on Oracle Big Data Appliance
2.11.6
About Puppet Security
2.12
Auditing Oracle Big Data Appliance
2.12.1
About Oracle Audit Vault and Database Firewall
2.12.2
Setting Up the Oracle Big Data Appliance Plug-in
2.12.3
Monitoring Oracle Big Data Appliance
2.13
Collecting Diagnostic Information for Oracle Customer Support
3
Supporting User Access to Oracle Big Data Appliance
3.1
About Accessing a Kerberos-Secured Cluster
3.2
Providing Remote Client Access to CDH
3.2.1
Prerequisites
3.2.2
Installing CDH on Oracle Exadata Database Machine
3.2.3
Installing a CDH Client on Any Supported Operating System
3.2.4
Configuring a CDH Client for an Unsecured Cluster
3.2.5
Configuring a CDH Client for a Kerberos-Secured Cluster
3.2.6
Verifying Access to a Cluster from the CDH Client
3.3
Providing Remote Client Access to Hive
3.4
Managing User Accounts
3.4.1
Creating Hadoop Cluster Users
3.4.1.1
Creating Users on an Unsecured Cluster
3.4.1.2
Creating Users on a Secured Cluster
3.4.2
Providing User Login Privileges (Optional)
3.5
Recovering Deleted Files
3.5.1
Restoring Files from the Trash
3.5.2
Changing the Trash Interval
3.5.3
Disabling the Trash Facility
3.5.3.1
Completely Disabling the Trash Facility
3.5.3.2
Disabling the Trash Facility for Local HDFS Clients
3.5.3.3
Disabling the Trash Facility for a Remote HDFS Client
4
Configuring Oracle Exadata Database Machine for Use with Oracle Big Data Appliance
4.1
About Optimizing Communications
4.1.1
About Applications that Pull Data Into Oracle Exadata Database Machine
4.1.2
About Applications that Push Data Into Oracle Exadata Database Machine
4.2
Prerequisites for Optimizing Communications
4.3
Specifying the InfiniBand Connections to Oracle Big Data Appliance
4.4
Specifying the InfiniBand Connections to Oracle Exadata Database Machine
4.5
Enabling SDP on Exadata Database Nodes
4.6
Configuring a JDBC Client for SDP
4.7
Creating an SDP Listener on the InfiniBand Network
Part II Oracle Big Data Appliance Software
5
Optimizing MapReduce Jobs Using Perfect Balance
5.1
What is Perfect Balance?
5.1.1
About Balancing Jobs Across Map and Reduce Tasks
5.1.2
Ways to Use Perfect Balance Features
5.1.3
Perfect Balance Components
5.2
Application Requirements
5.3
Getting Started with Perfect Balance
5.4
Analyzing a Job's Reducer Load
5.4.1
About Job Analyzer
5.4.1.1
Methods of Running Job Analyzer
5.4.2
Running Job Analyzer as a Standalone Utility
5.4.2.1
Job Analyzer Utility Example
5.4.2.2
Job Analyzer Utility Syntax
5.4.3
Running Job Analyzer Using Perfect Balance
5.4.3.1
Running Job Analyzer Using Perfect Balance
5.4.3.2
Collecting Additional Metrics
5.4.4
Reading the Job Analyzer Report
5.5
About Configuring Perfect Balance
5.6
Running a Balanced MapReduce Job Using Perfect Balance
5.7
About Perfect Balance Reports
5.8
About Chopping
5.8.1
Selecting a Chopping Method
5.8.2
How Chopping Impacts Applications
5.9
Troubleshooting Jobs Running with Perfect Balance
5.10
Using the Perfect Balance API
5.10.1
Modifying Your Java Code to Use Perfect Balance
5.10.2
Running Your Modified Java Code with Perfect Balance
5.11
About the Perfect Balance Examples
5.11.1
About the Examples in This Chapter
5.11.2
Extracting the Example Data Set
5.12
Perfect Balance Configuration Property Reference
Part III Oracle Big Data SQL
6
Using Oracle Big Data SQL for Data Access
6.1
What Is Oracle Big Data SQL?
6.1.1
About Oracle External Tables
6.1.2
About the Access Drivers for Oracle Big Data SQL
6.1.3
About Smart Scan Technology
6.1.4
About Data Security with Oracle Big Data SQL
6.2
Installing Oracle Big Data SQL
6.2.1
Prerequisites for Using Oracle Big Data SQL
6.2.2
Performing the Installation
6.2.3
Running the Post-Installation Script for Oracle Big Data SQL
6.2.3.1
Running the bds-exa-install Script
6.2.3.2
bds-ex-install Syntax
6.3
Creating External Tables for Accessing Big Data
6.3.1
About the Basic CREATE TABLE Syntax
6.3.2
Creating an Oracle External Table for Hive Data
6.3.2.1
Obtaining Information About a Hive Table
6.3.2.2
Using the CREATE_EXTDDL_FOR_HIVE Function
6.3.2.3
Developing a CREATE TABLE Statement for ORACLE_HIVE
6.3.3
Creating an External Table for HDFS Files
6.3.3.1
Using the Default Access Parameters with ORACLE_HDFS
6.3.3.2
Overriding the Default ORACLE_HDFS Settings
6.4
About the External Table Clause
6.4.1
TYPE Clause
6.4.2
DEFAULT DIRECTORY Clause
6.4.3
LOCATION Clause
6.4.3.1
ORACLE_HDFS LOCATION Clause
6.4.3.2
ORACLE_HIVE LOCATION Clause
6.4.4
REJECT LIMIT Clause
6.4.5
ACCESS PARAMETERS Clause
6.5
About Data Type Conversions
6.6
Querying External Tables
6.6.1
Granting User Access
6.6.2
About Error Handling
6.6.3
About the Log Files
6.7
About Oracle Big Data SQL on Oracle Exadata Database Machine
6.7.1
Starting and Stopping the Big Data SQL Agent
6.7.2
About the Common Directory
6.7.3
Common Configuration Properties
6.7.3.1
bigdata.properties
6.7.3.2
bigdata-log4j.properties
6.7.4
About the Cluster Directory
6.7.5
About Permissions
7
Oracle Big Data SQL Reference
DBMS_HADOOP PL/SQL Package
CREATE_EXTDDL_FOR_HIVE
Example
CREATE TABLE ACCESS PARAMETERS Clause
Syntax Rules for Specifying Properties
ORACLE_HDFS Access Parameters
Default Parameter Settings for ORACLE_HDFS
Optional Parameter Settings for ORACLE_HDFS
ORACLE_HIVE Access Parameters
Default Parameter Settings for ORACLE_HIVE
Optional Parameter Values for ORACLE_HIVE
com.oracle.bigdata.colmap
com.oracle.bigdata.datamode
com.oracle.bigdata.erroropt
com.oracle.bigdata.fields
com.oracle.bigdata.fileformat
com.oracle.bigdata.log.exec
com.oracle.bigdata.log.qc
com.oracle.bigdata.overflow
com.oracle.bigdata.rowformat
com.oracle.bigdata.tablename
Static Data Dictionary Views for Hive
ALL_HIVE_DATABASES
ALL_HIVE_TABLES
ALL_HIVE_COLUMNS
DBA_HIVE_DATABASES
DBA_HIVE_TABLES
DBA_HIVE_COLUMNS
USER_HIVE_DATABASES
USER_HIVE_TABLES
USER_HIVE_COLUMNS
Glossary
Index
Scripting on this page enhances content navigation, but does not change the content in any way.